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ActivoTables"^ for mining data on the Internet 



A Wide variety ot tabular dats exists on Llm World Wide Web. This rancoG trom 
aporta scorccordG (NBA, NFL. etc.) to annual financl^il reports of companies, ThiG 
document diecuseos, in brief, a mechanism fur mining these data and providing 
additional interseting information about the data Lothe viewer, within fche confines 
of a Java enabled web browser. 



Problem Definition 

A mechanism has to be developed wherf-.hy additional information can be 
obtained aboi/t the field(s) of a tablh? lli?it the user is viewing* and a v;ay in which 
this information can be mined along with additional sources ot information, to 
provide inleieslincj patterns to the us&.r. Idftally, this information should appear ir\ 
a floaling balloon, which constantly updates rtselt. as the user moves his/her 
mouse ciruLinri in the table. 

A table is rftprftsftntfid in a web page as a aeries ot H I ML tags. Consequently, 
them Is nn w^y in which an application (other than the browser) can tell w;hich 
column / mw of the table the user's mouse rs currently at. Also, there is no 
means, currently, of defining additional sources of information. 

The ActrveTabie'^^ mechanism is proposed to replace the current "passive" 
reprftsftntation ftf tables through HTML tags. 

What is an ActiveTable^ ? 

Ai! AciivftTrihlH'^" is an applet that contains the table to be mined. It is embedded 
in an HTML page, it is iUielf y JyvaBni^n, and con^sists of the following JavaDeans 

(1 ) A set of GridELements. Rarh GrldElement represents a single cell in the 
tablet. Eanh grid element "knows" its position (row/coJumn) in the table, as well 
as the datum that it contains. Each GridClement is capable of responding to 
mouse movements over its visual representation in the ActlveTable"^" 

(?) One nr more buttons, e.g., to turn on/off data mininQ, to ask user defined 
questions, etc. 

(3) A Floating Balloon, which would display interesting results from the current 
table. This updates continously. depending on the position of tho user s 
mouse. 

(4) A tcKt area, which would dicptay interesting data mininq results and answers 
to user questions by combining the contents of the current table with other 
data from the web. 

(5) Some other elemantE ot pure "decoratlVG" value, such as horizontal and 
vertical lines, bitmaps, etc. 
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Crsating an ActivoTabls'^*^ 

An ActiveTable'^'^ is a JavaBean. and can bs inserted into a web page (?), or any 
other Java application, just like any other JavaBean, using a visual builder tool 
(such as IDM VisualAge, or Symantec Visual Cafe). 

When a developer instantiates an ActiveTable^. a set of wizards appear, 
enabling him/her to cxistomize the ActiveTable^. The wizards would require 
answers to quftstions M.if,;|i (he fallowing : 

(1) Wh?i1 lh« primary data source of the ActiveTable"^ ? (the primary data 
suuicti of an ActiveTable"™ refers to the source "for the dat2 actually displayed 
to the user by the Activ/eTable, in Its GrldElemdnts. Thie could be a subset of 
yr an entire text file, ODBC table, otc). This Infomriatlon [s required. 

(9) What are the secondary data sources, If any ? (Secondary data sources 
lefer b additional data sources that can provide more infornnation to be 
, mined These could bw U?^l fileb, ODBC tables, text reports, video ciips^ etc) 
Thr$ inlbrmation is optional. 

Once the primaiy iIhIh .snurcw for an ActiveTable"^ ha$ been idenCfted, an applet 
reprfisenlirKj il be built by the builder tool, and inserted into the specified 
HTML p/sgs 

How will thft ActiveTable"^ work 7 

When ihf- user sees the HTML page, the ActivGTablG*« applet will be invoked 
HLitomaticalty Dy the browser. When she moves her mouse around on the table, 
iiie particular GridEiement on which the mouse ic located, will respond. The 
(iridPlement will pyrform some standard calculations, and update the floating 
balloon. For example, If the user's mouse Is on the FGM column for Michael 
Jordan in thp following Lubl«^ (nuw = 2, column = 2), the appropriate GridElement 
would respond. 



' Player I FRM 


FGA 


Jordan 


0 


10 


Pipp^in 


7 


10 


Kerr 


6 


8 



Since the GririFlement "knows" that it belongs to the 2"'' row and 2"*= column, it 
could display a mR5i5;age such the following : 

"Did you know that Michael Jordan niHrlfs 5 out of the Bulls' 1 8 Field Goaie (2B 
%) ? 

Alternatively, the GridElement could act mors intellioeitliy. Since it "knows" It 
belongs to the TGM" column, it could query the "FGA^' column, and come uo 
with : 

"Did you know that Michael Jordan shot 5 for10 (50 %). whfitft^^s lh« Bulls shot 
18for28(6^ %)?" 
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Note Ifrat the firct example could always work, in any domain, whereas ths 
second represents a customization for a sperJfio (in this case, baykelbali) 
domain. 

When the user clicks on thie Data Mining button . the AcliveTable'^M ^^rouid look up 
data from the secondary sources spedfiftd by the devoloper, and mino the 
results togsthsr. ( Ihc developer would h?ive to provide some additional 
information while customizing the Arfi*/eTabl$tM^ such as th9 Focus Attributes 
etc.) 

Tho ueer could click on the Ask Quft.Miuns button, to ask hsr own questions 
These quGEtions would also be firerl across dll ihe (primary and secondary) dat^ 
sources associated with the ActivftTableT-w 

A ^ptcal DM / user question answer would be displayed in the answer text area 
and would look like this : "Did you know Ihal in this season, aqainst the Knicks in 
ihe games in which Jordan had more ihsn 10 rebounda, the Bulls won 3 of 4 
yames (75%) " 

Other issEUBs : 

(1 ) T7ie ActlvsTaDlC" JaveBean can be crmuerted into an ActiveX control thus 
enabling the same functionality fmm within MS-Offic© contalncra. such a*; 
MS-Word, MS-hxccI, etc. 

(2) The ActlyGTable'*^ could firet be mada available In Its entirety to deveioperfi 
The developer would bs f^h\e to cusitomize an ActivcTable-^^ by specifyinn 
•dHTercnt data sources, etr: Later, the various parts of the ActiveTahlft^^ 
(buttons, text areas, file: ) could be mad© available individually so that 
dGvelopcrs can customize lh« look and feel of the entire ActiveTable^ to their 

J™!,i'f ^P'l^-'fy.^ / ODBC data sources as the sftnm.dary data 

Gourcc. in orderto m.n« data, the current ArtivcTable™ has query / imnort 
data trorr these snuruBs. Thrs has various disadvantagfts. e.g.. it assumsB 
the existence of a database server (In case of ODBC), and it also involves 
querying, and somfttimes, parsing files for importing data. Hrjwnver if these 
eeconaary data sources are itismeelves represented using AcliveTables'« 
lifn^o"^ ff^'^^^'y sources of other ActiveTab!esT«). then, using 
RMI (Rem.,I« Method Invocation) mechanism available in Java, H miqh? 
f^fTJi^ ^7""^ ""''^ Actlverablcs™ for ser^ndary d«la. thus ensurino 
truly distnbuted computing. For example, consider the rullowing scenano : 

In the above example. fl,e user clicks on data mining, while viewinu Jordan-;: 

a?.'ing^"oTeStr '^^"^^ ^ ''"^^ 

^''^ "^^^ ^« Bulls ygainst the Knicks 
please respond wrth soma data (related to Shootins) for ll.« game.' 
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Th\k qiiftry goes out to ell ActiveTablcs"^^^ on the NBA server, arid t^vukes 
repunsHS from all ths ActiveTables*'^ with primary data related to Bulls-Knicks 
games. Th9 client ActivsTablB^ cyri ij»e.n use these data to perform some 
analysis, and show the results; to Lhe user. 

Som© comments on this mechanism : 

a. It Is totally Independent of the way the dylw arft stored on the server (text files. 
ODBC, etc.) 

b. It does nul uverload any database server {howRVp.r, it does cause additional 
burden iu Lhe HTTP server). From the point nf viaw of security, a large 
company vwould prefer requests to the HTTP sftrver rather than allowinQ 
rriiilirins of usiftru dfrEtct access to their databases. 

r. It involves minimal server-side programming (just setting UP the mechanism 
by which ArtivftT;^hlF!?^'»^ r.nulri rftspond to these remote requests) 

d. Data w\\}m \}\^ AcliveTableT^ could be intelligently indft^ed for faster rstriftvai 

e. Infonnation obtained from various ActiveTables^^ cuuld be cached by the 
client ActiveTable™ for r&-use 

t. However, it does not make use of the database server's inherent optimi2ation 
with respect to SQL queries. 

Unclear at this point : 

a. Once the ActiveTable'^^' has boon customized by the developer, it can be 
caved to 0 '.cer tile. I his will ensure that ft retaine all the properties that the 
developer has set. The ".ser" file can then be opened by a browser, like an 
applet, through an extension to HTML. However, as of now, no browser 
supports this serialized version of Java class fllss. 

b. RMI requires more study to detemnlne that all that has been mentioned 
above, is, In reality, possible to Implement 



Phase 1 af the project; 

• Develop the beans for implementation of ActrveTable™. 

■ Implement the customizer for the ActlveTable™, where the ActlveTabieg"** is 
Cased on a single data source. (No filter for the data) 

• Make the qhds of th^ ActiveTable"^ responds to mouse movements and pop 
up with simple out interestinq staljstical calculations on the data from tho 
primary data source of the ActivcTabic ™. 

• Start working on the re-dssign of the data mining algorithm, and the 
presentation of data mining results. 
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AciiveTablo'^ Design 
T^.a Adivo 1 abi.- will appear as .hov.n b.low : 
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